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Introduction 



The increasing use of pen computing and the emergence of paper4>ased inteilaces to networked 
computmg resources [10,11] has highlighted the need for techiiqies to ^^HgiteS Pen 

subsequently search this date using hand-written or hand-drawn queries However searcCtr ^ 

produchon of handwntmg and hand-drawn images, and tfans me&ods for inBrovins search accuracv 
usnig donain-specific knowledge, constraints, and contextual infoinS^ ^bfe^ 
d^cusses a number of novel techniques for improving the acou^ ofl^Ll^ 
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Digital Ink Definitioa 



Digital ink IS a digital representation of the infbimation generated by a pea4>ased iimut device 
Generally, digital ink is structured as a sequence of strokes diat begin when the pen device makes 
contact with a drawing surface end ends when the pen device is lifted. Each stroke comprises a set of 
sampled coordmates that define the movement of die pen whilst the pen is in contact wifli the 
drawing surrace. 



Digital Ink Searching 

The traditional mediod of searching handwritten data is to first convert the ink database and 
correspondmg query to text using pattern recognition techniques, and then to match the query text 
the text m the database. Fuzzy text searching methods have been described [13] that perfonn 
text matching m the presence of character errors similar to those produced by handwritme 
rccogmtion systems. ' «u«w*Ami6 

However, hand>witing recognition accuracy remains low, and the number of errors introduced by 
recognition (both for the database entries and for the handwritten quciy) means that this technique 
does not work weU. The process of converting ink into text results in the loss of a significant amount 
of mformation regarding die general shape and dynamic properties of the ink. For exainple. some 
t ^i.. " handwritten with a great deal o3l^ty in 

shape Additionally, m many handwriting styles (particularly cursive writing), the identification of 
maividual charactexs is highly ambiguous. 

Digital ink searching refers to the process of searching through a continuous stream of digital ink for 
patterns that most closely match the input query according to some similarity metric. Direct matching 
on raw digital ink allows shape mformation to be considered during the search procedure and does 
not require character or word segmentation to be perfonned. Techniques for digital mk searching 
havebeenp«)posedfai[H,12,13,14,15,l6,17.18.19^0Jl]. searcmng 



1.4 Digital Ink Search Applications 



A number of highly desirable applications are made possible by the combination of digital ink 
persistence and digital ink searching, including the ability to search annotations, notes, comments, 
and other handwritten information for keywords or phrases. The digital ink searching procedure is not 
limited to simply matching the query text, as additional attributes can be used to more accurately 
specify the desired information. Examples of these attributes include: date and time of writmg, the 
identity of pen used to produce the writing, geographic location where the writing took place, 
application with which the writing is associated (e.g. electronic mail or notebook), type of field that 
contains flie writing (e.g. a text input field, a drawing field), die location of the annotation or text on 
the page, and so on. 

Pen-based queries also allow searching for infonnation other than handwriting. Hand-drawn picmre 
scarchmg can be used to locate drawings and diagrams in a notebook, and can be used to search a 
collection of digital images. As an example, a hand-drawn picture query could be used to search an 
online photo album or commercial Image library for pictures that contains a desired visual feature or 
set of visual features [15J. 

1.5 Detailed Description of the Preferred Embodiments 

In the preferred embodiment, die invention is configured to work with the Netpage networked 
computer system, a detailed description of which is given in our co-pending applications, including in 
particular PCX application WO0242989 entitled "Sensing Device" filed 30 May 2002 PCT 
application WO0242894 entitled ^Interactive Printed* filed 30 May 2002, PCT application 
WO02 14075 'Interface Surfece Printer Using Invisible Ink" filed 21 Febniary 2002, PCT application 
WO0242950 "Apparanis For Interaction With A Network Computer System" filed 30 May 2002, and 
PCT application WO03034276 entitled "Digital Ink Database Searching Using Handwriting Feature 
Syntiiesis" filed 24 April 2003. It will be appreciated that not every implementation will necessarily 
embody all or even most of the specific details and extensions described in these applications in 
relation to the basic system. However, the system is described in its most complete fonn to assist in 
understanding the context in which the preffeired embodim^ts and a^ects of the present invention 
operate. 

In brief summary, the prcfeiied fonn of (he Netpage system provides an interactive paper-based 
interfece to online infonnation by utilizing pages of invisibly coded paper and an opricaUy imaging 
pen. Each page generated by the Netpage system is uniquely identified and stored on a network 
servw, and all user interaction with the paper using the Ne^age pen is captured, interpreted, and 
stored. Digital printing technology fecilitatcs the on-demand printing of Netpage documents 
allowing intwactive applications to be developed. The Netpage printer, pen, and networi^ 
infi^tiucture provide a paper-based alteniative to traditional screen-based applications and online 
publishing services, and supports user-inter&ce fimctionaUty such as hypertext navigation and fonn 
input 

Typically, a printer receives a document fix>m a publisher or application provider via a broadband 
connection, which is printed witii an mvisfljle pattern of infiiared tags that each encodes tiie location 
of tiie tag on die page and a unique page identifier. As a user writes on the page, die imaging pen 
decodes tiiese tags and converts the motion of the pen into digital ink. The digital ink is transmitted 
over a wireless channel to a relay base station, and tiien sent to the networic for processmg and 
storage. The system uses a stored description of the page to intezpiet the digital ink, andperforais the 
requested actions by interacting witik an application. 

Applications provide content to the user by publishing documents, and process the digital ink 
interactions submitted by the user. Typically, an application generates one or more interactive pages 
in response to user input, which are transmitted to die network to be stored, rendered, and finally 
printed as output to the user. The Netpage system allows sophisticated applications to be developed 
by providing services for document publishing, rendering, and delivery, authenticated transactions 
and secure payments, handwriting recognition and digital ink searching, and user validation using 
biometric techniques such as signature verification. 
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2 Domain-Specific Specializatioii 



Many of the digital ink searching algorithms described above are designed to search a specific type of 
digital ink. For example, the systems proposed in [23^4) are most effective when searching printed 
or cursive handwritten Latin-script text, whilst [12,14,161 describe techniques for searching hand- 
drawn pictures. Similarly, systems can be developed that are optimised for searching other specific 
types of digital ink, such as oriental handwritten characters, technical drawings, or hand-drawn 
equations. 

In most cases, systems designed to search a specific form of digital ink will achieve greater accuracy 
than general-purpose digital ink searching methods, since these systems are able to utilize domain- 
specific knowledge when designing the ink searching algoriduns. Knowledge of the expected digital 
ink format will mfluence the selection of segmentation techniques, pre-processing and normalization, 
the pattem primitives used (e.g. stroke, sub-stroke, stroke group, bitmap image, etc.), the extracted 
feature set, the matching algorithm, the similarity metric, and so on. The steps required to perform 
digital ink searching using specialization are given in Figure 1 . 

2.1 Specialization Examples 

For searching cursive Latin-script handwriting, techniques can be developed to exploit the key 
. characteristics of this type of writing, such as the powerful discriminatory influence of ascender and 
descender elements (e.g. 'tidfthjkJpqty"), the existence of specific zones widiin the writing (base 
lines and core lines), and the relatively stable ordering of the handwritten strokes (at least within the 
writing of a single auttior). Additional high-level information can also be utilized, such as the 
expectation that tfie writing will be clustered into approximately linear lines that contain groups of 
strokes representmg words and letters. Further specialization, is possible if it is known that the 
matchmg digital ink is largely numeric (e.g. a phone number), since digits are usually drawn 
consistenfly, being well segmented (no ligatures) and with a regularity of character height. 
Specialized search strategies are also possible for handwritten text that contains only uooer-case 
letters. 

However, the requirements for accurately searching hand-drawn pictures and scribbles are 
significantly different, and most of the key discriminatory characteristics of handwriting are not 
avaUable. Hand-drawn picture search algorithms must be snote ordw and stroke direction 
insensitive, due to the large number of different ways the same picture may be drawn. Generally, the 
algorithm must also be lOtationaUy insensitive, since drawings can be made at arbitrary orientations 
on a page. To hnprove accuracy, picuire searching algorithms may exploit the feet that most drawings 
are rendered using an aggregation of line and shape primidves that may be used to decompose &e 
image into a canonical form usefol for similarity matching. 

Otiier domain-specific spcciaUzations for digital ink search can also be made. For example, systems 
for searching oriental handwritten characters can utilize the highly accurate character segmentation 
techniques that have been developed for oriental character recognition systems [26). In addition to 
this, they may exploit the fact that the characters are generally composed firjm a small set of 
primitive radicals, whilst compensating for the potentially large stroke-order variation that can occur 
during writing. 

Additional specializations exist for other types of digital ink data, such as hand-drawn equations, 
diagrams, and charts. In general, specializations can be made for any type of digital ink data that 
contains a structure or regularity that may be exploited to provide improved discriminatory features. 
An awareness of the consfraints and expected deviation of the data can be used to differentiate noise 
fiom information, and thus provide a more accurate sunilarity metric. 

2.2 Using Specialized Searching 

Having a set of specialized searching strategics is only useful if it can be accurately determined when 
each particular strategy should be used. In the sitnplest case, the determination is made at a system 
level; for example, allowing a system administrator to select Latin-script based searching or oriental 
character searching dependmg on the location or expected users of the system. It is also possible for 
this decision to be made automatically, given the existence of metrics that can accurately differentiate 
between Latin-based and oriental scripts [25,27]. Similar techniques exist to differentiate written text 
mm hand-drawing images. 
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A more flexible system allows individual segments of digital ink to be labelled as a specific digital 
ink type, and subsequently searched using algorithms specialized for that particular type. For 
example, the system may aUow a user to indicate that they generally write using a specific language 
(e.g. in English or Chinese) or writing style (e.g. cursive, printed, upper-case, or mixed) and this 
information can be used to select the appropriate ink searching mechanism. In addition to this, the 
system may allow the user to manually indicate the type of digital ink being generated. For example, 
the user could use a number of different pens (e.g. one for handwriting text and another for drawing 
pictures) allowing the system to discriminate between the different ink types. Alternatively, gestures 
or other user-initiated actions could be used to label ink data. 

Another approach to specialized digital ink searching is to require the manual selection of the search 
method when the search query is generated For example, if the user wishes to search for English 
handwritten text, they write their text query, and then indicate to the system that an English 
handwritten text search shoiUd be performed using the specified query, Similariy, if the user wishes 
to search for a hand-drawn picture, they draw their query and indicate to the system to perform a 
drawing search. Since most digital ink searching systems perform some kind of pre-processing or 
indexing at the time of ink generation (rather than when the queiy is generated) to ensure a fast 
response to ink search queries, delaying the search strategy decision until the point at which the 
search is initiated means that either: 

• the ink data must be pre-processed nuiltiple times and stored io nniltiple formats (ie. once 
for each search strategy), or 

• the pre-processing must be delayed until the search is initiated (dius increasing the time it 
takes to generate the search results). 

The improvement in the aocuracy of the ink search may justify the increased resource utilization 
required by this technique. 



3 Specialization Using Context Information 

In addition to the techniques described above, the application of specialized digital ink searching 
techniques can be deteimined fiom the context (i.e. the contents of the page or document on wdiich 
the ink was written) of die digital ink. Inteipreting the information contained in the layout and 
definition of a document can guide the selection of the ink search strategy. 

3.1 Language/Script Identification 

It is reasonable to assume that annotations and comments made on a printed document will usually be 
written in the same language as the text contained in the document itself. Thus, if die natural 
language of a document (Le. the language that the text in the document was written in) can be 
determined, specialized ink search strategies can be used to search digital ink annotations contained 
on that document 



Many document fomaats allow the explicit definition of the natural language of the document For 
example, in HTML/XHTML [6,7] the "lang" attribute can be used: 

<HTMIi lands'' en* dir«''rtl''></HTML> 

M^iere the language is identified by a two-letter code (e.g. "en" for English, "es" for Spanish, etc.) as 
specified in [1,2]. The example also shows the ability to specify the text direction ("dir") as right-to- 
left C W) or left-to-right ("itr**), another assumed characteristic of the digital ink that can be used 
when performing digital ink searching, Similariy, in the XML/XFORMS document specification "a 
special attribute named '*xml:lang'* may be inserted in documents to specify the language used in the 
contents and attribute values of any element in an XML docimienf * [4]: 

<TXTIJB janl:lang»*'fr">XFoxina en XBT14Zi<:/TZlIiB> 
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The Adobe Portable Document Fonnat (PDF) defines the "Lang" attribute, a 'nanguage identifier 
specifying the natural language fcr all text" [S]. The identifier can be used in the document catalog 
(thus specifying the language of the entiie documeni), ia any stnictunsd element, or in maikec^ 
content sequences: 

/Span « /I-ang (fr) » 
BDC 

(Bonjour.) Tj 

EMC 



Documents may aUo use the Dublin Core metadata element set. "a standard for cioss-domam 
mfonnauon resource description" [28] diat identifies the language associated widi a resource using 
toe standard laaaiage codes [2.3]. Dublin Core metadata conftims to the World Wide Web 
Consortium (W3C) Resource Desciiption Framework, and can be used widi HTML and XML 
dociunents [29,30]. 

If a document fonnat does not allow the specification of the document language, or the languase 
specification attribute is missing, the language of die document may be inferred using otte 
techmques For example, the use of a particular font will often imply that die document was audiored 
m a particular language or script. In some document formats (such as PDF [5]). font objects contain a 
anguage aWibute that indicates the natural language of the font In addition to this Uiere enst 
techmques that allow (he language of a document to be accurately delennined using dictionaries. 
Note that some specialized digital ink searching techniques are optimised for a specific script (e g 
Utm charaoters, Onental chamctew, Arabic chaiacters. etc.) that mchides a group of languages 
raAer than bemg languap specific. Obviously, any technique diat exploits hmguage identification for 

"".'^ ^^^'^ ^""^ specialization, sinc^intificalion of tite 

language scnpt is usually trivial once (he language is known. 

3.2 Field Labels 

Docummts and forms that require data to be entered, either using a keyboard for screen-based 
^"^S^ofSL^'^TT or paper documents. mTgive theTeT^ 

mdication of die ^e of mfomation that Is required. This is usuaUy done by labelling each data input 

identifier, for example, "First Name", "Last Name". "AddreM". 
Thone Number", and so on For pnnted forms, this mformation appears as printed text on thepapcr 
wMe onlme (i.e. computer-based) documents usually conlam flus in&nnation as a visible t^l^ 
defined m (bit structured description of die form. ^ 

1^^^?^ contained in the field labels described above can be used to detemime die digital ink 

khl ^1^?! t "PP~P"«» «8ion by analysing die form description to associate 

dln^Tfi IT " ^^"^ fidd. - teble of previously 

defined field label strmgs is searched (possibly «iag tegular, exptessfon matching) and Ac 

^'^^-^^ ink ^ sttategy is fo^id. T1« foUowing example 



Ink Type 


Field Label 


Text 


First Name, Given Name, Surname, Family Name, Address, Suburb. Town, 
State, Country, Region, Email Address 


Numeric 


Phone number. Age, Number, Size, Count, Zip Code. Post Code, Date. Time, 
Credit Card Number, Customer Number 


Drawing 


Picture, Drawing, Image, Diagram 
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3.3 Field Attributes 

In additioa to the field type, form definitions often contain information regarding the type of data that 
should be entered in each field. This infonnation is usually contained in attributes that are associated 
with a specific field. For example, some input field types have a flag indicating that the value entered 
must be numeric. A digital ink searching system can use this information to select a numeric search 
strategy for ink contained in the associated data input area. 

in addition to using standard form field attributes to improve the accuracy of digital ink searching, 
digital ink search i5>edfic information can be added to fields using custom attributes. This 
mformation is only used if die document is processed using a digital ink searching system; the 
document can still be used normally where required (e.g. printed or displayed in web browser) since 
procKsmg systems generaUy ignore the unrecognised custom attributes. However, if digital ink 
searchmg is required, the custom parameters can be used to improve die accuracy of the search 
results. 



4 Conclusion 



A number of techniques to improve the accuracy of digital ink search were discussed, inchiding a 
method of selecting a digital ink searching strategy fiom a set of specialized strategies based on the 
expected ink data type. Examples of specialized digital ink searching strategies were given, along 
with a number of methods tor integratiiig these strategics into a system. In addition to this, methods 
for selecting digital ink search strategies based on the layout and definition of a document was 
discussed. 
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Figure 1. Digital Ink Seardiing Using SpeciaUzatioa 
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